21. Storing Data

Storing Data

Storing is usually done after cleaning, but it's not always done, which excludes it from being a core part of the data wrangling process. Sometimes you just analyze and visualize and leave it at that, without saving your new data.

Again, because storing is performed on cleaned data, we could cover this at the end of Lesson 4 ("Cleaning Data"). But since we're covering file formats in this lesson, let's cover it here.

Imagine you've assessed and cleaned your data, which includes merging all of these separate pieces of data, which as I mentioned in the last video I took care of behind the scenes for you. What do you want to do next?

Storing Data

The advantages and disadvantages of flat files were discussed earlier in the lesson in the Flat File Structure concept. One of the advantages:

Great for small datasets.

And one of the disadvantages:

Sharing data can be cumbersome.

Given the size of this dataset and that it likely won't be shared often, saving to a flat file like a CSV is probably the best solution. With pandas, saving your gathered data to a CSV file is easy. The to_csv DataFrame method is all you need and the only parameter required to save a file on your computer is the file path to which you want to save this file. Often specifying index=False is necessary too if you don't want the DataFrame index showing up as a column in your stored dataset. If you had a DataFrame, df , and wanted to save to a file named dataset.csv with no index column:

df.to_csv('dataset.csv', index=False)

Quiz

In the Jupyter Notebook below, store the master DataFrame in the lesson in a file called bestofrt_master.csv . After you do that, inspect the Jupyter Notebook Dashboard (click jupyter in the top lefthand corner of the notebook) for that saved file.

Workspace

This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.

Workspace Information:

  • Default file path:
  • Workspace type: jupyter
  • Opened files (when workspace is loaded): n/a